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Introduction 


For the third quarter of this research contract, we are going to report progress on 
the following four Tasks (as described in the contract): 

2 . Feature Calculation; 

3 . Membership Calculation; 

4. Clustering Methods (including initial experiments on pose estimation); 

5 . Acquisition of images (including camera calibration information for 

digitization of model). 

The report, as we have done in the past, consists of "stand alone" sections, 
describing the activities in each task. We would like to highlight the fact that during this 
quarter, we believe that we have made a major breakthrough in the area of fuzzy clustering. 
We have discovered a method to remove the probabilistic constraint that the sum of the 
memberships across all classes must add up to 1 (as in the fuzzy c-means). A paper, 
describing this approach is included (it is under review for the IEEE Transactions on Fuzzy 
Systems). 



Feature Calculation 


We have acquired images, digitized from video tape, and have begun the process of 
feature extraction and segmentation. We have concentrated on texture-based features and 
edge based features. On subsequent pages, we describe the 14 texture features which are 
calculate from the gray-tone spatial dependence matrices. We then show a typical image of 
the shuttle, with earth as background followed by images of various texture features 
extracted from the image. It is obvious from the resultant images that some of the features 
are good discriminators while others are quite poor. Following those images, we show the 
results of segmenting this image using the "threshold unit" approach described in the 
second quarter report. 



The gray- tone spatial dependence matrix P(i,j) is computed from a window with 
a size of L y xL x . Denote d be the distance between the two pixels in the window, then 
we have, 

P(i,j,d, 0°) = #{((k,l),(m,n)) e ( L y x L x ) x (L y x L x )\k - m = 0, 

|Z - n| = d, I(k, l) = i. I(m. n ) = j) 

P(i,j, d, 45°) = #{((/c,Z),(m,n)) € ( L y x L x ) x (L y x L x )\k - m = d, 

l — n = —d or (k — m = —d,l — n = d),I(k,l) = i.I(m,n) = j} 
P(i,j,d, 90°) = #{((M),(m,n)) e {L y x L x ) x (L y x J L I )||fc - m| = d, 
l — n — 0, 1(k,l) = i,I(m, n) = j} 

P(i,j,d, 135°) = #{((fc, /), ( m,n )) 6 (L y x T x ) x (L y x L x )\k — m = d, 

l — n = d or (k — m = —d. I — n = —d),I(k,l) = i.I(m,n) = j} 


where # denotes the number of elements in the set. 

The following notations are used to compute the 14 texture features. 

Notation 

p(i,j) (i, j)th entry in a normalized gray-tone spatial-dependence matrix, 

= P(i,j)/R. R is a normalization factor. 
p x (i) ith entry in the marginal-probability matrix obtained by summing the rows 

of 

N g Number of distinct gray levels in the quantized image. 

N„ N g 

H &nd S £ and respectively. 
i j i=l j = 1 


N s 

Py(J) = £]p(m)- 

1=1 

N g N g 

fc = 2,3,--,2A y . 

1=1 J=1 

1+J = fc 


Px-fy(^) 



N g N g 

Px-y(k) = & = o,i, -- ,iv a - 1. 

i=i j=i 

i*— ji=*= 

The 14 texture features are defined as following. 
1) Angular Second Moment: 


/i 

* i 


2) Contrast: 


h= n 2 p x -y(n) 

n=0 


3) Correlation: 

h= 

where p x , p y , a x , and o y are the means and standard deviations of p x and p y . 

4) Sum of Squares: Variance 


* 3 

5) Inverse Difference Moment: 


6) Sum Average: 


7) Sum Variance: 


8) Sum Entropy: 


f‘=ZZ rnh]y p(iJ) - 

2 N 0 

= y ^ z Px+y( z )’ 

i-2 

2 Ng 

f 7 = ~ /g) Px+y(0* 

t=2 

2 AT, 

/8 = ~ y ^Px+y log{Px4-y(Q}» 
i=2 



9) Entropy: 


10) Difference Variance: 


« 3 


fio = variance of p x - y 


11) Difference Entropy: 

N ,- 1 

fn = - Px- y (i)log{px- y (*)}- 

t=0 

12) , 13) Information Measures of Correlation: 


_ HXY - HXY1 
fl2 ~ max{HX, HY } ' 
f n = (1 - exp[— 2.0(tfXF2 - HXY)]) 1/2 . 

HXY = -^^p(i,j) log(p(t,i)). 

* i 

where HX and HY are entropies of p x and p y . and 

HXYl = -EE lo s(P.(')PvO')) 

* 3 

HXY2 = - EEP-WPvWogW*) PvU)}. 


14) Maximal Correlation Coefficient: 

fn = (Second largest eigenvalue of Q) 1 ^ 2 




£ 


Px(i)Py{k) 


where 




fa) original image 
(c) entropy 


b) homogeneity 

’d) angular second moment 





(C) 


(d) 

(a) contrast 

(tO correlation 


(c) sum of squares 

(d) sum average 







(C) 


(d) 


'a) sum variance 
c) difference variance 


ri) sum entropy 
d) difference entropy 







(C) 


(a) information measure of correlation, method 1 

(b) information measure of correlation, method 2 

(c) maximal correlation coefficient 





Calculation of Membership Functions 


Our work in this area has progressed nicely. We have designed and implemented 
numerous algorithms to generate membership values from a set of training data using 
histograms, results of fuzzy clustering, and heuristic definitions. We have also made 
progress in the transformation of "probability density functions" into possibility 
distributions for use in assigning membership values to individual points. The following 
report describes three methods of converting histograms of features into possibility 
distributions from which we can calculate membership function values for segmentation 
and recognition. The last example demonstrates these techniques on the "homogeneity" 
feature for the shuttle and background on our sample image. 




Methods for Generating Membership Functions : 


Fuzzy set theory has been used extensively in the literature for decision making, 
particularly in situations involving uncertain, vague and imprecise data supplied by 
heterogeneous sources. Many of these approaches involves the use of memberships ( or 
degree of satisfaction of criteria). Thus membership functions play a crucial role in fuzzy 
set theoretic decision making. 

There are three general approaches to constructing membership functions: i) 
heuristic methods ii) clustering methods, and iii) histogram-based methods. Heuristic 
methods assume that the shape of the membership functions is known(e.g. trapezoidal, 
triangular, and Gaussian). Heuristic methods have been used very successfully in control 
approaches. In computer vision, heuristic membership functions may be used to describe 
certain relational notions (such as above, below) and certain properties (such as lightness or 
darkness of a pixel value, position of a pixel, narrowness of a region). However, heuristic 
methods are not sufficiently flexible if one is to construct the membership functions from 
training data. This is because assumption on the shape of the membership function are too 
limiting. In such cases, fuzzy clustering methods and histogram based methods are more 
useful. Clustering based methods use fuzzy clustering techniques to obtain a fuzzy partition 
of the training data. The membership values generated by the fuzzy partition are used for 
decision making. Several fuzzy clustering techniques exist in the literature, however, this 
approach will not be discussed here. We now describe membership generation techniques 
based on histograms of feature (training) data. 

In image processing applications, histograms have been traditionally treated as 
probability distributions. In pattern recognition, many methods exist for estimating pdfs 
from samples. Since probabilities represent relative frequencies, it is reasonable to assume 
that if we have a huge number of samples that represent the ensemble whose area has been 
normalized to 1. Thus, methods that transform probabilities to possibilities (membership 
values) can be used to generate membership functions from histograms. To our 
knowledge, there are three ways to construct membership functions from given probability 
density functions. All of these methods assume that we have probability density functions 
at hand before these methods are applied, or we can approximate them as histograms. 



Method 1: 


This method suggested by D. Dubios and H. Prade[l] is based on 
probability/necessity measure theories. 

Let X= {jcj I i= } to be a universe of discoures. The x('s are ordered such that 

n 

Pj 2 p 2 >....^p n , where p. = P(fXj}), E />,• =1 and P is a probability measure. A/ 

denotes the set {x}, x2 , xiJ. Ao = 0 by convention. 

Definition 1. The degree of necessity of event A is the extra amount of probability of 
elementary events in A over the amount of probability assigned to the most frequent 
elementary event outside A. In order words 


N(A) = Z max (pj - max pk , 0) 
xj € A xk e A 

Proposition 1. The set function .V satifies the following axioms: 
N(0) = 0 ; N(X) = 1 
V A, B X N(AnB) = min(N(A),N(B)) 


( 1 ) 


( 2 ) 


(3) 


Definition 2. Viewing N(A) as the grade of impossibility of the opposite event A we can 
define the grade of possibility of A by 

V A X I1(A)=\-N(A) (4) 


where N( A) is defined by (1) 

Proposition 2. The set function 77 defined by (4) verifies the axioms. 


U0) = 0 ; 

n(X) = 1 ; 

(5) 

V A,B 

n(AuB) = moxflKA), TT(B)) 

(6) 



Hence /7 is a possibility measure in the sense of Zadeh. 


Denoting , i r/ = Il({xi}), we have n iA) = max 7T/ 

x ieA 

so that n and N are completely specified through the possibility distribution (iti I i = 
which can be viewed as a normalized fuzzy set 

The Jti's are easily obtained from the pi's since 


ni= 1 - N(X - {xi}) = 1 - mi-i) 

i-1 

= 1- E (pj - pi) for i > 1 

j=l 


and 717 = 1 (normalization). Using 1 pi - 1 we get 

i=l 


(7) 


n 

V i = 1, ,n JTi = ipi + E pj ( 8 ) 

;•=/+/ 

The equation( 8 ) gives us a way to generate membership functions from given probability 
density functions. 

It is easy to see from ( 8 ) that for i= 1, ... ,n - 1 

m - 7ti+i = i(pi-pi+l) (9) 


and thus 


Ki = m+1 <=> Pi=Pi+L n> n+1 <=> Pi>Pi+l 

i.e., the possibility distribution and the probability density have the same shape. 

Proposition 3. If the probability assignment p maps on the possibility distribution it via ( 8 ) 
then 

VA N(A) < P(A) <IJ(A) 

Therefore, the possibility distribution they defined satisfies Zadeh's probability/possibility 
consistency principle. 



Proposition 4. The possibility distribution itj is greater or equal to normalized 

probability distbution. that is. 


V i = 1 ,n 


*i -777777 


Pi 


Pl(-Pmax) 


n 

proof : From Eq(8), Jtj = ipi + X pj 

j=i+l 

consider that E <5= j-W+% jJ^Pj = § p f + ,Pi 


Note 


PI 


n 


that ipj > X pi and — X pj 
j=f J Pi j=i+l J 


* t Pj 

j~i+l 


Therfore, nj > 1 so, Kj > 

Pi PI 


Experimental Results : 

We applied this methodl to some simple pdfs, for example, linearily decresing, 
triangle, trapezoid, and normal pdfs. As shown in figure 1-1, for the linear parts of pdfs 
we have the corresponding possibility distribution in the form of quadratic equation as 
expected. One might refer to Appendix A for the closed form solutions of possibility 
distributions corresponding to these simple pdfs . The shape of possibility distribution for 
a normal pdf is much similar to the original one. 

We applied this to a homogeneity feature of a space shuttle image in Figure 2- 1 (b). 
In this experiment, we constructed histograms of a object(shuttle) and a background by 
sampling pixels from Figure 2- 1(b), and smoothed histograms by a binomial window with 
window size 11. Figure 2-2 shows that the smoothed histogrmas of object and 
background, and their corresponding possibility distributions. Note that the possibility 
distributions are computed from normalized histograms(i.e., pdfs). 



Method 2 : 


This method suggested by Klir[2] is based on uncertainty measure in probability 
and possibility theories. He claims that under the transformation, values pi must 
correspond to values rj for all i=l,....,n by some appropriate scale and , in addition, the 
amount of information should be preserved. 

In other words, the total amount of uncertainty in the probability distribution must 
be equal to the total amount of uncertainty in the possibility distribution. 

Let p - (p],p2 ,Pn) and r = (ri,r2, r n ) denote, repectively, probability 

and possitility distributions ( defined on a finite set X with n elements) that do not contain 
zero elements and are ordered in such a way that/?/ >/?/+/ and r/ > r/+/ V /= 1 , 2 , n -1 

n 

That is, pj e (0,1] , r/ e (0,1] and £ ^pi = 1. 

probabilistic measure of uncertainty is the well known shannon entropy. 

n 

H(p) = -Ljpilog2Pi (1) 

In possibility theory, two type of uncertainty coexist, nonspecificity and discord ; their 
measures are 


N(r) = 


n 

- Zpin log2 ]rf 


( 2 ) 


n-1 n 

D(r) =- £ (rpri+j) log 2 [1-i . £ , 7 ^ 77 / 

i=l j=i+lj(j-l ' 


(3) 


respectively. 

Hence, the requirement that information be preserved by the transformation is expressed by 
constraining the scaling between p and r by the equation. 


H(p) = N(r)+D(r) 


(4) 



Klir contends that log-interval seal transformation is the only one that exists for all 
distributions and is unique . 

Its form is 


n = (pi/pi) a 

where a is a positive constant determined by solving Eq (4) for given H(p). 

From extensive computer experimentation, Klir conjectures that cxe [0,1]. If the conjecture 
is true, than r/ >p[ for all i=l,...,n is guaranteedfprobability-possibility consistency 
principle). 

Experimental Results: 

We applied this method 2 to previously defined pdfs. The results are also shown in 
figure 1-1. Our experiment shows that there are some cases where we may have a>l for 
the trapezoid and the normal pdfs. 

The same homogeneity feature as used in method 1 was considered to compute the 
possibility distributions and all results are summarized in Figure 2-2. One can easily notice 
that the membership values computed by both methods (method 1 and 2) are quite similar. 

Method 3: 

The last method which we are investigating now is suggested by Civalar and 
Trussel[3]. It is based on an optimization technique to find optimal membership functions 
from a given pdf. They claim that in order to define a reasonable membership function, 
there are certaim conditions which can be imposed on the membership function to make the 
set have properties consistent with the user's subjective judgement and the underlying pdf. 
From a heuristic viewpoint, the elements which are most likely should have high 
membership values, however, the possibility distribution should be as specific as 
possibile. These requirements are quantitatively describved below: 

1. E(rrix ) I x is distributed according to the underlying pdf/ >c 
where the confidence level c should be close to unity. 


2. 0 < p(x) < 1 


3. J fi^(x)dx should be minimized to obtain a specific membership 


function. 


The optimal membership function defined by these condition can be derived using 
constrainted optimization techniques. They found that the optimal membership function is 
given by 


H(x) = 


Xp(x) 

1 


if ty(x)<l 
if Xp(x)>l 


( 1 ) 


where p(x) is the pdf or its estimate derived form the histogram of the feature used for 
defining the fuzzy set, and the constant — is to be solved from 

K ,K * p 2 ^) dT i + * J , p 2 ^ )dl i - c = 0 (2) 

Xp(x)<l Ap(x)> 1 


A interesting result they found is that the membership function corresponding to Gaussian 
with c = 1/ ■'/2 is a normalized Gaussian function with the highest value equal to 1. 


APPENDIX A. 


Some simple pdfs and its closed form solution of corresponding possibility distributions. 
1. Linearly decreasing case: 



2. Triangle case: 



pi = /-ri + a(a ^h /c for i is odd number. 

pi = [-t i +a]/c for i is even number where c = n-1 ). 

^2 2 y\(t\ 2 j 

7Tf = - , + - — —j i + „ " for i is odd number. 

(n-l) A (n-l) z (n-lr 


3. Trapezoid case : 



pi = [ i + J/d for i (>a+l) is odd number. 

h hb 

pi = [ i + ^)/d for i(> a) is even number where d = (a+b)h/2. 


it i = + 2i - n.2 + 2bn] for i(>a+l) is odd number. 


Appendix A. Probability distributions and corresponding possibilities generated by D. 
Dubios: Note that p/fr and are arranged in non-increasing order. 
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Image Segmentation via Binary Encoding 


1. Introduction 

In this proposal, a binary encoding method is introduced for use in image 
processing. In particular, the area of image segmentation. Given a 2 n gray scaled image that 
consists of objects and background, each pixel value can be encoded by n bits that consists 
of gray levels(eg„ for an image that has 256 gray levels, 8 bits are needed) representing the 
objects and background. Based on this representation, a boolean function /consisting of 
minimum sums-of-products(MSP) can be obtained by means of the Quine-McCluskey 
tabular method to segment the objects from the background. Section 2 explains how this is 
obtained in greater detail. The performance of this method is tested on various images, 
which is shown in Section 3. 

2. Binary encoding method 

Given n binary valued inputs(jq,...,x n ), there exits 2 n combinations of the input 
variables. If we consider a binary output to represent the combinations of the input 
variables, a minimum sums-of-products(MSP) function/can be obtained that describes the 
output by means of he Quine-McCluskey tabular method. For purposes of image 
segmentation, given a 2 n gray scaled image that consists of objects and background, the 
gray levels can be encoded into n bits. If we consider the n bits as n input variables and the 
output to be either objects or background, a function / that segments objects from the 
background can be obtained by using the method described above. The coded inputs are 
obtained shown by the table below. 




gray level 

input variables 


*1 — *« 

0 

0-00 

1 

0-0 1 

2 n 

1-11 


The following Section shows various examples using the binary encoding method with 
resulting segmented images. 

3. Examples 

Figure 1(a) shows a 200x200 size image of the space shuttle in a background of 
clouds. This image consists of 256 gray levels, constituting to 8 bit encoding. Our 
objective is to segment the shuttle from its background using the binary encoding method 
described in Section 2. For the training data, a window of size 20x20 is used to take coded 
gray values from both the shuttle and background, constituting to 10% of the total image. 
For the gray values that do not exist in either the shuttle or background, are considered as 
"don't cares". For the gray values that are the same for both shuttle and background, they 
are also considered as "don’t cares". Letting the shuttle and background represent the 
output value 1 and 0 respectively, the function / that segments the object from the 
background resulted as follows: 

- *1*2 + *2*3*4 + *1*3 + *4*5*6*7*8 + *3*5*6*7*8 +*2*3*6*7 
+ *2X3*6*7 + *2*3*8 + *1*4*5*6 + *1*5*6*7*8 + *1*4*5*7 • 

The remaining 90% of the image was used for testing and the resulting segmented image is 
shown in Figure 1(b) using the function / above. As expected, the results showed a poor 


segmentation of the image. This is due to the training data which consisted of several gray 
levels that were the same for both the shuttle and background. This problem can also arise 
in many other segmentation methods. In order to improve the segmentation, several 
features are calculated. Figure 1(c) shows the image of the homogeneity feature and the 
function / that segments the object from the background using this image resulted as 
follows: 


f(xi , . . . yX n ) = *l + *2*3*4*5*6 + *2*3*5*6*8 • 

Figure 1(d) shows a great improvement in the results using the homogeneity feature. Two 
other features, namely, entropy and contrast, were calculated and their respective images 
are shown in Figure 2(a) and Figure 2(c). The functions that segments the object from the 
background using each corresponding image resulted as follows: 

For the entropy feature, 

f{x l,...,x„) = X\X<\X-]X% + XiX4*6*7*8 + *1*3 + *1*2 

and for the contrast feature, 

f{x \,...,x n )=xi +X 2 + *3 • 

Figure 2(b) and Figure 2(d) show the segmented images obtained from the entropy and 
contrast feature images, respectively. These results also show a great improvement over the 
segmentation using the original image. 
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Clustering Methodologies 


The best way to describe the new work in this task is to include a copy of a 

manuscript recently submitted by Dr. Krishnapuram and Dr. Keller to the IEEE 

Transactions on Fuzzy Systems. 

The title of the paper is: 

"A Possibilistic Approach to Clustering" 

This represents a radical new approach to the theory and practice of fuzzy 
clustering. 


A Possibilistic Approach to Clustering 


Raghu Krishnapuram and James M. Keller 
Department of Electrical and Computer Engineering 
University of Missouri, Columbia, MO 65211 


Abstract 

Clustering methods have been used extensively in computer vision and pattern recognition. 
Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that 
total commitment of a vector to a given class is not required at each iteration. Recently fuzzy 
clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also 
clusters which are actually "thin shells", i.e., curves and surfaces. Most analytic fuzzy clustering 
approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the 
probabilistic constraint that the memberships of a data point across classes sum to one. This 
constraint was used to generate the membership update equations for an iterative algorithm. 
Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the 
intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in 
noisy environments. In this paper, we cast the clustering problem into the framework of possibility 
theory. Our approach is radically different from the existing clustering methods in that the resulting 
partition of the data can be interpreted as a possibilistic partition, and the membership values may 
be interpreted as degrees of possibility of the points belonging to the classes. We construct an 
appropriate objective function whose minimum will characterize a good possibilistic partition of the 
data, and we derive the membership and prototype update equations from necessary conditions for 
minimization of our criterion function. We illustrate the superiority of the resulting family of 
possibilistic algorithms (particularly in the presence of noise) with several examples. 


1 


I. Introduction 


Clustering has long been a popular approach to unsupervised pattern recognition [1]. It has 
become more attractive with the connection to neural networks [2,3.4], and with the increased 
attention to fuzzy clustering [5, 6,7, 8]. In fact, recent advances in fuzzy clustering have shown 
spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 
"thin shells", i.e., curves and surfaces [8,13]. One of the major factors that influences the 
determination of appropriate groups of points is the "distance measure" chosen for the problem at 
hand. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in 
that total commitment of a vector to a given class is not required at each iteration. 

Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means 
(FCM) algorithm [14]. The FCM uses the probabilistic constraint that the memberships of a data 
point across classes must sum to one. This constraint came from generalizing a crisp C-Partition of 
a data set, and was used to generate the membership update equations for an iterative algorithm. 
These equations emerge as necessary conditions for a global minimum of a least-squares type of 
criterion function. Unfortunately, the resulting memberships do not represent one's intuitive notion 
of degrees of belonging, i. e., they do not represent degrees of "typicality" or "possibility . 

The following simple examples illustrate the problems associated with the probabilistic 
constraint as related to clustering. Consider the two clusters shown in Figure 1(a) with two 
outlying points A and B. Intuitively point A, being an outlier, should not have a high degree of 
membership in either cluster. Point B should have an even smaller membership in either cluster, 
because it only vaguely represents either one of them. Yet, the FCM assigns a membership of 0.5 
in the two clusters to both of them. Thus, the membership values are not only unrepresentative of 
the degree of belonging, but they cannot distinguish between a moderate outlier and an extreme 
outlier. Figure 1(b) represents another situation where there are two intersecting clusters. Here 
again, the probabilistic constraint in the FCM updates would force a membership of 0.5 in the two 
clusters to both point A and point B. This is again counterintuitive since point A is a "good" 
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member of both clusters, whereas point B is a "poor" member. These situations arise because the 
probabilistic memberships cannot distinguish between "equal evidence" and "ignorance". More 
recent theories such as belief theory [15] and possibility theory [16,17] have tried to correct this 
problem. Zadeh suggested that membership functions of fuzzy sets can be interpreted as possibility 
distributions [18]. 

There is another important motivation for using possibilistic memberships. Like all 
unsupervised techniques, clustering (crisp or fuzzy) suffers from the presence of noise in the data. 
Since most distance functions are geometric in nature, noise points, which are often quite distant 
from the primary clusters, can drastically influence the estimates of the class prototypes, and 
hence, the final clustering. Fuzzy methods ameliorate this problem when the number of classes is 
greater than one, since the noise points tend to have somewhat smaller membership values in all the 
classes. However, this difficulty still remains in the fuzzy case, since the memberships of 
unrepresentative (or noise) points can still be significantly high. In fact, if there is only one real 
cluster present in the data, there is essentially no difference between the crisp and fuzzy methods. 
The prototype parameters (such as the center and orientation) and properties of the cluster (such as 
hypervolume) can be greatly affected by the noise in the data. Recently, Dave has suggested a 
heuristic method to improve the performance of the FCM algorithm and its derivatives in the 
presence of noise by including a noise cluster [19]. Although the results shown are good, this 
method introduces an artificial class and still suffers from the drawbacks due to the probabilistic 
constraint. His algorithm, for example, would assign a membership of about 0.5 in both classes to 
point A in Figure 1(b). 

On the other hand, if a set of feature vectors is thought of as the domain of discourse for a 
collection of independent fuzzy subsets, then there should be no constraint on the sum of the 
memberships. The only real constraint is that the assignments do really represent fuzzy 
membership values, i.e., they must lie in the interval [0,1]. In this paper we cast the clustering 
problem into the framework of possibility theory. Our approach is fundamentally different from the 
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existing clustering methods in that the resulting partition of the data can be interpreted as a 
possibilistic partition, and the membership values may be interpreted as possibility values, or 
degrees of typicality of the points in the classes. This is more in keeping with the concept of 
membership functions in fuzzy set theory. The possibilistic C-partition defines C distinct 
(uncoupled) possibility distributions (and the corresponding fuzzy sets) over the universe of 
discourse of the set of feature points. Thus, our approach is intrinsically fuzzy, in the sense that the 
memberships are not "hard" even when there is only one class in the data set. In section II, we 
construct an appropriate objective function whose minimum will characterize a good possibilistic 
partition of the data, and we derive the membership and prototype update equations from necessary 
conditions for minimization of our criterion function. These equations lead to an entirely new 
family of possibilistic clustering algorithms. In section III, we illustrate the superiority of the 
resulting family of possibilistic algorithms (particularly in the presence of noise) with several 
examples. Finally, section IV gives the summary and conclusions. 

II. Possibilistic Clustering Algorithms 

The original FCM formulation minimizes the objective function given by 
C N C 

J (L,U) = . X X 0^) m d]j , subject to . X n tj =1 for all 7. (1) 

In (D. L = (Aj,...^) is a C-tuple of prototypes, dfj is the distance of feature point Xj to cluster 

A ; -, A' is the total number of feature vectors, C is the number of classes, and U = [ji.] is a C xN 

matrix called the fuzzy C-partition matrix [14] satisfying the following conditions: 

C 

H- e [0,1] for all 1 and j, . X fi.j = 1 for all j, and 
N 

0< X a.. <N for all /. 

J = 1 ‘J 

Here. is the grade of membership of the feature point Xj in cluster A ( -, and m e [ 1,°°) is a 
weighting exponent called the fuzzifier. In what follows, A f - will also be used to denote the /th 
cluster, since it contains all of the parameters that define the prototype of the cluster. 
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Simply relaxing the constraint in (1) produces the trivial solution, i. e., the criterion 
function is minimized by assigning all memberships to zero. Clearly, one would like the 
memberships for representative feature points to be as high as possible, while unrepresentative 
points should have low membership in all clusters. This is an approach consistent with possibility 
theory [16]. The objective function which satisfies our requirements may be formulated as: 


N 


N 


= ,? u ?, /* 4 + , j h (1 -V" • <2) 

where ifr are suitable positive numbers. The first term demands that the distances from the feature 
vectors to the prototypes be as low as possible, whereas the second term forces the fl-j to be as 

large as possible, thus avoiding the trivial solution. The choice of Tji will be discussed later. 


Theorem: 

Suppose that X = {x l ,x 2 , .... is a set of feature vectors, L = (A lr ...,A c ) is a 
C-tuple of prototypes, dfj is the distance of feature point x y - to the cluster prototype A,-, (t = 1, 
..., C; j = 1, ..., N), and U = [/i»] is a C xN matrix of possibilistic membership values. Then U 

may be a global minimum for J m (L,U ) only if /iy = £ 1 + necessary 

conditions on the prototypes are identical to the corresponding conditions in the FCM and its 
derivatives. 

Proof : 

In order to derive the necessary conditions and the membership updating equations, we 
first note that the rows and columns of U are independent of each other. Hence, minimizing 

J m (L,U) with respect to U is equivalent to minimizing the following individual objective function 
with respect to each of the jiij (provided that the resulting solution lies in the interval [0,1]). 


^(^>=<■4 + 1 ,- 


Differentiating (3) with respect to Wj and setting it to zero leads to the equation 


(3) 
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It is obvious from (4) that /Zy lies in the desired range. Since the newly added second term in the 
objective function is independent of the prototype parameters and the distance measure, the 
derivative of our new criterion function with respect to those parameters will be identical to that for 
the FCM or the appropriate generalization. QED 

Thus, in each iteration, the updated value of /Zy depends only on the distance of Xj. from 
Xj, which is an intuitively pleasing result. The membership of a point in a cluster should be 
determined solely by how far it is from the prototype of the class, and should not be coupled to its 
location with respect to other classes. The updating of the prototypes depends on the distance 
measure chosen, and will proceed exaedy the same way as in the case of the FCM algorithm and its 
derivatives, as will be explained shortly. 

It is apparent from (4) that the constraints satisfied by the possibilistic C-partition are 

N 

H-j e [0,1] for all i and j, and 0<_ L /Zy < N for all t. 

Eq.(4) defines a possibility distribution (membership) function for cluster A; over the domain of 
discourse consisting of all feature points x -. We denote this distribution by i7/. The value of m 
determines the fuzziness of the final possibilistic C-partition and the shape of the possibility 
distribution. When m-» 1, the membership function is hard, and when m-*», the memberships are 
maximally fuzzy. A value of 2 for m (which seems to give good results in practice), yields a very 
simple equation for the membership updates. 

The value of r\i determines the distance at which the membership value of a point in a 
cluster becomes 0.5 (i. e„ "the 3 dB point”). Thus, it needs to be chosen depending on the desired 
"bandwidth" of the possibility (membership) distribution for each cluster. This value could be the 
same for all clusters, if all clusters are expected to be similar. In general, it is desirable that 77 , 


relates to the overall size and shape of cluster A/. Also, it is to be noted that T]i determines the 
relative degree to which the second term in the objective function is important compared to the first 
If the two terms are to be weighted roughly equally, then ifc should be of the order ol tfiy. In 


practice we find that the following definition works best 
N 

o 


Ik 


n« = 


M 


IJ ' J 


N 


( 5 ) 


k 


y'=l 

This choice makes 77,- the average fuzzy intra-cluster distance of cluster A/. The following rule 
may also be used. 


Z i, 

g(n ; ) a * 

I (ndd 


_ x, € (n ; ) a 
ni ~ 


( 6 ) 


where (FIi) a is an appropriate a-cut of 77 ,. In this case, Hi is the average intra-cluster distance for 
all of the "good" feature vectors (those vectors whose memberships are greater than or equal to a). 


The value of 77,- can be fixed for all iterations, or it may be varied in each iteration. When TJ-, 
is varied in each iteration, care must be exercised, since it may lead to instabilities. Our experience 
shows that the final clustering is quite insensitive to large (an order of magnitude) variations in the 
values of 77,-, although the final shapes of the 77 / do depend on the exact values of 77,. Thus, the 
best approach is to compute approximate values for the 77, based on an initial fuzzy partition using 
( 5 ), and after the algorithm converges, recompute more accurate values for the 77 7 using ( 6 ) and 
run the algorithm for the second time. The second run typically converges in a couple of iterations. 
This is only necessary if the actual values of class memberships are required. If only the relative 
degree of strength is needed (for example, to generate parameters for the cluster or to produce a 
hard partition), then this final step can be omitted. The second pass through the algorithm with 
refined values for 77, allows the resultant memberships in a noisy environment to be nearly 
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identical to those obtained in a noise-free state. Any value of oc between 0. 1 and 0.4 seems to yield 


consistent results. 


We propose a family of possibilistic clustering algorithms whose general form is as 


follows. 


THE POSSIBILISTIC CLUSTERING ALGORITHM: 

Fix the number of clusters C; fix m , 1 < m < 

Set iteration counter / = 1; 

Initialize the possibilistic C-partition l/- 0 ^ 

(using a suitable fiizzy clustering algorithm); 

Estimate using (5); 

Repeat 

Update the prototypes using as indicated below; 
Compute l /' l+ ^ using (4); 

Increment / ; 

Until ( II 1/< M) - U in \\ <£); 

{Reestimate rji using (6) and rerun the repeat loop if required} 


The updating of the prototypes depends on the distance measure chosen. Different distance 


measures lead to different algorithms. If the distance is an inner product induced norm metric as in 
the case of the FCM algorithm, i. e„ if djf = (*y-c;) T A. (x ; -c,) where c, is the center of cluster A„ 


updating of the prototype is achieved by [14] 

N 

Isfri 

j = 1 


Ci = 


N 


A 


m 


/=! 


(7) 


This gives us the Possibilistic C-Means (PCM) algorithm. If the distance measure is the scaled 
Mahalanobis distance [6,7,20], i. e., if if = IF ,1 1 ^ (x j-Ci) T Fi ~ 1 (x j-c ; ) , where F/ is the fuzzy 

covariance matrix of cluster A; = (cj, F,- ) then the center c, is still updated using (7), and the 


fuzzy covariance matrix is updated using 
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( 8 ) 


N 


Fi = 


X^i/ (Xj-Ci)(Xj-Ci) T 

;=i 

N 

X^ 

/=* 


This gives us the Possibilistic C-Planes (PCP) algorithm. In the case of spherical shell clusters 
[9,13], one possible distance measure is = d 2 (XjJ.j) = (It*) * c / II 2 - r ; 2 ) 2 > where c ( is the center 

and r, is the radius of cluster A and the updating of the prototypes is given by 

Pi =-5<".)' 1 "V 


(9-a) 


where 


P,= 



The resulting algorithm may be called the Possibilistic C-Spherical Shells (PCSS) algorithm. A 
Possibilistic C-Quadric Shells (PCQS) algorithm [12,21] may also be defined similarly. 


III. Examples of Possibilistic Clustering 

In this section, we show several examples of possibilistic clustering to illustrate the ideas 
presented in the previous section. We first present a simple example to provide insights into the 
possibilistic approach. We then present more realistic examples, and compare the performance of 
the possibilistic clustering with those of the corresponding hard algorithms, and those of FCM and 
its derivatives. 

The first example involves two well-separated clusters of seven points each. In this case, 
the hard C-Means algorithm, the FCM algorithm, and the PCM algorithm all give the same final 
crisp partition shown in Figure 2(a). The crisp partition for the PCM and PCM are obtained by 
assigning each feature vector to the cluster in which it has the highest membership. Ties are broken 
arbitrarily. The cluster centers in all three cases are the same. The membership values for the FCM 
and PCM cases are shown in Table 1. The feature vectors are numbered in the order in which they 


9 


would be encountered in a top to bottom, left to right scan of the image shown in Figure 2(a). It 
can be seen that the FCM memberships are almost hard (i. e., they are close to 1 or 0) in every 
case. This may be desirable if a hard partition is required, but the memberships do not differentiate 
between close and far members of the clusters. On the other hand, the PCM algorithm provides 
more graded membership values, and these membership values are more in keeping with one's 
intuitive notion of belonging. Note that the farther away the feature vector is to the typical member 
(i. e., the prototype), the smaller the membership. As we noted in the previous section, the rate of 
fall of the membership values can be adjusted depending on the choice of 77 /. However, this has 
virtually no effect on the final clustering obtained. We chose to keep the computation of 77 / the 
same for all examples, and this was done as explained in the previous section. 

Figures 2(b), 2(c), and 2(d) show the final crisp partition obtained due to the hard C- 
Means, FCM, and PCM algorithms respectively, when two noise points are added to the set of 
feature vectors shown in Figure 2(a). The hard C-Means algorithm actually puts the farthest noise 
point as one cluster, and lumps all the rest into another cluster, although this may depend on 
initialization. The crisp partitions of the FCM and PCM are identical, however, the membership 
values and the cluster centers obtained are considerably different, as can be seen in Table 2. The 
first two entries in the table correspond to the two noise points, and the FCM algorithm gives 
approximately equal memberships of 0.5 in both clusters for the noise points. This significantly 
affects the estimates of the cluster centers, as can be seen in the table. The PCM algorithm, on the 
other hand, gives very low memberships for the two noise points in either cluster, and the farther 
point has a lesser membership than the closer one, as desired. As a result, the cluster centers are 
virtually unchanged. The membership values of the points in each of the clusters is also virtually 
unchanged in spite of the addition of the noise points. In fact, the memberships will not change 
even if an entire new cluster of feature points is added to the data set. This is a highly desirable 
result, especially if the clustering algorithm is to be used to estimate membership distribution 
functions for the various classes. 
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Figure 3 shows a more realistic example with two classes. Each class has 25 feature 
vectors in the noise-free case. These points were generated with a Gaussian random number 
generator. Figure 3(a) shows the clustering obtained by the Hard C-Means, Fuzzy C-Means, and 
Possibilistic C-Means algorithms. The crisp partitions are identical. Figures 3(b), 3(c) and 3(d) 
show the crisp partition resulting from the Hard C-Means, Fuzzy C-Means, and Possibilistic C- 
Means algorithms when the data from class 2 (the lower class) arc noisy. The crisp partition due to 
the Hard C-Means is quite miserable, and the crisp partition due to the Fuzzy C-Means is not 
satisfactory either. The performance of the possibilistic C-Means is quite acceptable. The cluster 
centers for the three methods for the noise-free and noisy cases are shown in Table 3. As can be 
seen, the center estimates are poor in the cases of the Hard and Fuzzy C-Means algorithms. 

Figure 4 shows an example involving linear clusters. Figure 4(a) shows the clustering due 
to the Hard C Planes (HCP) [22,23], Fuzzy C Planes(FCP) [6,7], and Possibilistic C Planes 
(PCP) algorithms. The final estimates of the prototypes are shown superimposed on the original 
data set. The results are identical when there is no noise. Figure 4(b), 4(c), and 4(d) show the 
results of the HCP, FCP, and PCP algorithms respectively, when noise is added. As can be seen, 
the results of both the HCP and the FCP algorithms are quite poor. However, the results of the 
PCSS algorithm are virtually the same as those of noise-free case. 

The fourth example involves the detection of circles. Figure 5(a) shows the original data set 
with two circles. Figure 5(b), 5(c), and 5(d) show the results of the Hard C Spherical Shells 
HCSS [13], Fuzzy C Spherical Shells (FCSS) [9,13], and Possibilistic C Spherical Shells (PCSS) 
algorithms respectively, with the final estimates of the prototypes superimposed on the original 
data set. As can be seen, the results of the HCSS algorithm are very poor. (They correspond to a 
local minimum). The FCSS and PCSS algorithms give the same results in this case. Figure 6 
shows the results of the same data set when noise is added. The performance of the HCSS 
algorithm is again unacceptable. The FCSS algorithm performs better, however, the estimates of 
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the centers and the radii suffer from the presence of noise. The results of the PCSS algorithm are 
virtually unaffected by noise. 

The greatest difference between the FCM-based and PCM-based algorithms is for the case 
where there is but one cluster in the data set. In this case there is essentially no difference between 
the FCM-based methods and hard methods. Figure 7 illustrates this idea. Figure 7(a) and 7(c) 
show the estimates of the prototype parameters for a noisy line and a noisy circle when the FCP 
and FCSS algorithms are used. The estimates are severely affected by noise. Figures 7(b) and 7(d) 
show the clearly superior estimates with the PCP and PCSS algorithms. 

IV. Conclusions 

In this paper, we present a possibilistic approach to objective-function- based clustering. 
We argue that the existing fuzzy clustering methods do not provide intuitively appealing 
membership values due to the fact that an inherently probabilistic constraint is used. As a result, 
membership of a feature vector in a cluster depends not only on where the feature vector is located 
with respect to the cluster, but also on how far away it is with respect to other clusters. This 
"conservation of total membership" law forces the memberships to be spread across the classes, 
and thus makes them dependent on the number of clusters present. The resulting membership 
values cannot always distinguish between good members and poor members. This situation arises 
because probabilistic membership values cannot distinguish between "equally likely" and 
"unknown". On the other hand, if one takes the possibilistic view that the membership of a feature 
vector in a class has nothing to do with its membership in other classes, then we can achieve more 
realistic membership distributions. Our possibilistic approach to clustering is based on this idea. 

Since our membership functions correspond more closely to the notion of typicality, the 
resulting algorithms are naturally more immune to noise. Thus, our approach is intrinsically fuzzy, 
since the memberships are not "hard" even when there is only one class in the data set. This is 
compatible with the fuzzy set theoretic notion of membership functions. The partition of the data 
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resulting from our approach can be interpreted as a possibilistic partition, and the membership 
values may be interpreted as possibility values, or degrees of typicality of the points in the classes. 
The possibilistic C-partition defines C distinct (uncoupled) possibility distributions (and the 
corresponding fuzzy sets) over the universe of discourse of the set of feature points. Therefore, the 
family of algorithms we propose can be used to estimate possibility distributions directly from 
training data. Currently there are no good algorithms to estimate possibility distributions directly 
from training data, other than those that do so by converting probabilities to possibilities [24,25]. 
This conversion does not yield very appropriate results when the FCM-based memberships are 
used, since the memberships do not have a frequency interpretation, and since the memberships 
have already lost the distinction between "equally highly likely" and "equally highly unlikely". The 
possibilistic approach has the added advantage of being a natural mechanism to assign "fuzzy 
labels" to training data for use in more sophisticated pattern recognition algorithms. Finally, we 
would like to point out that the possibilistic algorithms may be viewed as a generalization of the 
weighted least squares approaches [26] and robust parameter estimation methods [27], which have 
been used with good results in computer vision [28,29], 
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Table 1: Memberships and centers resulting from the FCM and PCM values for the noise- free data 
set shown in Figure 2(a). 



Fuzzy 


Possibilistic 

C-Means 


Cluster 1 

Cluster 2 

Cluster 1 

Cluster 2 

1 

0.996 

0.004 

0.632 

0.007 

2 

0.004 

0.996 

0.007 

0.632 

3 

0.988 

0.012 

0.300 

0.005 

4 

0.997 

0.003 

0.631 

0.006 j 

5 

1.000 

0.000 

1.000 

0.007 | 

6 

0.996 

0.004 

0.632 

0.008 

7 

0.980 

0.020 

0.300 

0.009 

8 

0.020 

0.980 

0.009 

0.300 

9 

0.004 

0.996 

0.008 

0.632 

10 

0.000 

1.000 

0.007 

1.000 

11 

0.003 

0.997 

0.006 

0.631 ' 

12 

0.012 

0.988 

0.005 

0.300 

13 

0.996 

0.004 

0.632 

0.007 

14 

0.004 

0.996 

0.007 

0.632 

centers 

(60.0, 150.0) 

(140.0, 150.0) 

(60.0, 150.0) 

(140.0, 150.0) 
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Table 2: Memberships and centers resulting from the FCM and PCM values for the noisy data set 
shown in Figures 2(c) and 2(d). 



Fuzzy 

C-Means 

Possibilistic 

C-Means 



| 

Cluster 1 


1 


0.501 

0.004 

0.004 

2 


0.502 

0.017 

0.017 

3 

n jmSHra 

0.001 

0.636 

0.007 

4 

0.001 

0.999 

0.007 

0.636 

5 

0.977 

0.023 

0.299 

0.005 

6 

0.989 

0.011 

0.626 

0.006 

7 

0.996 

0.004 

1.000 

0.007 

8 

0.994 

0.004 

0.644 

0.008 

9 

0.985 

0.015 

0.307 

0.009 

10 

0.015 

0.985 

0.009 

0.307 

11 

0.004 

0.996 

0.008 

0.644 

12 

0.004 

0.996 

0.007 

1.000 

13 

0.011 

0.989 

0.006 

0.626 

14 

0.023 

0.977 

0.005 

0.299 

15 

0.985 

0.015 

0.634 

0.007 

16 

0.015 

0.985 

0.007 

0.634 

centers 

(62.8,145.9) 

(137.2,145.9) 

(60.0,150.0) 

(139.9,150.0) 


Table 3: The estimates of centers using the HCM, FCM and PCM algorithms 



Hard C-Means 

Fuzzy C-Means 

Possibilistic C-Means 

No 

noise 

(102.0, 88.4) (81.9, 

118.5) 

(102.0, 87.5) (82.3,118.1) 

(101.7, 87.9) (82.4, 117.4) 

With 

noise 

(92.2, 103.8) (46.7, 

155.8) 

(96.4, 95.8) (66.2, 139.4) 

(98.7, 93.9) (86.4, 112.6) 
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List of Figures 


Figure 1: (a) Example of a data set with two noise points A and B in which the memberships 

of the noise points resulting from the FCM algorithm in both clusters are about 
0.5, even though point B is much less representative of either cluster than point A. 
(b) Example of a data set with two intersecting clusters in which the memberships 
of the points A and B resulting from the FCM algorithm in both clusters are about 
0.5, even though point A is a "good" member of both clusters and point B is a 
"poor" member of both clusters. 

Figure 2: Results on a simple data set: (a) The crisp partition resulting from the HCM, FCM 

and PCM algorithms, (b) The crisp partition from the HCM algorithm, when noise 
is added, (c) The crisp partition from the FCM algorithm, when noise is added, (d) 
The crisp partition from the PCM algorithm, when noise is added. 

Figure 3: Results on a data set generated by a Gaussian random number generator: (a) The 

crisp partition resulting from the HCM, FCM and PCM algorithms, (b) The 
crisp partition from the HCM algorithm, when noise is added, (c) The crisp 
partition from the FCM algorithm, when noise is added, (d) The crisp partition 
from the PCM algorithm, when noise is added. 

Figure 4: Estimation of parameters of lines (The lines generated from the estimated prototype 

parameters are superimposed on the original data set): (a) Parameter estimates 
obtained with the HCP, FCP and PCP algorithms when no noise is present, (b) 
Parameters obtained with the HCP algorithm when noise is added, (c) Parameters 
obtained with the FCP algorithm when noise is added, (d) Parameters obtained with 
the PCP algorithm when noise is added. 
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Figure 5: Estimation of parameters of circles when no noise is present (The circles generated 

from the estimated prototype parameters are superimposed on the original data set): 
(a) Original data set. (b) Parameter estimates obtained with the HCSS algorithm, (c) 
Parameter estimates obtained with the FCSS algorithm, (c) Parameters obtained 
with the PCSS algorithm. 

Figure 6: Estimation of parameters of circles in noise (The circles generated from the 

estimated prototype parameters are superimposed on the original data set): (a) 
Original data set (b) Parameter estimates obtained with the HCSS algorithm, (c) 
Parameter estimates obtained with the FCSS algorithm, (c) Parameters obtained 
with the PCSS algorithm . 

Figure 7: Estimation of prototype parameters in noise when only one cluster is present: (a) 

Line parameters obtained with the FCP algorithm, (b) Line parameters obtained 
with the PCP algorithm, (c) Circle parameters obtained with the FCSS algorithm, 
(d) Circle parameters obtained with the PCSS algorithm. 
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Pose Estimation Using the UCOS algorithm 


In some cases, the Unsupervised C Quadric Shells (UCQS) algorithm can be used 
to estimate the pose of the shuttle. The shuttle's image is taken from the back so that the 
exhaust nozzles and the back edges of the three wings arc apparent. Given an original 
unrotated image, the exhaust nozzles can be parametrized by three circles, and the three 
wings can be parametrized by three straight lines. These parameters are easily determined 
by the UCQS algorithm. As the shuttle rotates, the shape of the nozzles will change from 
circles to ellipses, so will the orientation of the straight lines representing the three wings. 
The UCQS algorithm is used in order to cluster this edge image and determine the 
parameters of the ellipses and lines. Finally, these parameters can be used to solve for the 
translation and rotation parameters, as long as the translation is made in the image plane. In 
fact, depth information can also be derived from the change in the size of the nozzles. 

The (UCQS) algorithm was used to cluster edge images of the back of shuttle 
model. See figures C-l through C-3. Fig.C-1 shows an image of the unrotated shuttle (the 
reference position), and Fig.C-2 and C-3 show images of the rotated shuttle. In all the 
figures, Fig.a shows the original gray level image taken of the back of the shuttle. Fig.b 
shows the corresponding edge image, and Fig.c shows the prototypes of the clusters found 
by the (UCQS) algorithm. The (UCQS) algorithm not only is able to cluster the image 

correctly, but it also determines the parameters of each cluster. 

% 

The equation of a quadric shell in the 2-D case (quadratic curve) can be written as 
follows: 

2 2 

a i Xj +02x2+ a 3 xi X 2 + 04 xi + 05 X 2 + o<5 = 0. 


This equation can also be written as 



x l A x + x l v + d = 0. 


where x is the feature vector (xj, X 2 ) 1 , and the parameters of the cluster are: A which is a 
2x2 matrix, v which is 2-element vector and d which is a real number. The (UCQS) 
algorithm finds these parameters for figures C-l through C-3, and they are tabulated in 
tables C-l through C-3. 


Table. C-l 






























Table. C-2 



Table. C-3 
















































Acquisition of Imag es 

We have digitized several frames from a promotional video tape on the space 
program. The example which was presented in the feature calculation section is one of 
those images. In addition, we have constructed a device to hold a model of the space shuttle 
at a known orientation so that we can digitize it for the "pose estimation" research. We have 
shown several examples of those images in an earlier report. The report on camera 
calibration is included. These calabration equations are necessary to accurately determine 
the pose parameters for the model as they appear in an image. 

Recently, we have received a tape from Lincom with many simulated shutde images 
in different orientations. We are in the process of devising experiments utilizing these 
images. 

Finally, we have arranged to borrow several video tapes of shuttle missions from 
the NASA library (we only recently found out such a facility existed). These, we hope, will 
supply us with a good set of real images to test our algorithms. 



Camera Calibration 


The most direct way in obtaining the image coordinates (x,y) of a world point w is 
to apply the set of matrix equations defining the various parameters of the camera. These 
parameters are the focal length, offsets, and angles of pan and tilt. The mentioned 
parameters could be measured directly, or we could use the camera as a measuring device 
to estimate the parameters. This technique is know as camera calibration. 

The advantage of such a technique is that it eliminates the need to keep on 
recalibrating the camera when the camera is moved. 

Procedure: 

Define a matrix A containing all the camera parameters: 


a ll 

a 12 

a 13 

a 14 

a 21 

a 22 

a 23 

a 24 

a 31 

a 32 

a 33 

a 34 

a 41 

a 42 

a 43 

a 44 


Let W be a point in the Cartesian world coordinate system 


W = 


"X~ 

Y 

_Z_ 



The homogeneous counterpart is defined as 



Let C represent the Cartesian coordinates of any point in the camera coordinate 

system 



or the corresponding homogeneous image plane vector form 




Letting k=l we can write 


“ c hl " 


a ll a 12 a 13 a 14 


” X “ 

C h2 


a 21 *72 a 23 *24 


Y 

C h3 


a 31 a 32 a 33 a 34 


Z 

- C h4 - 


a 41 a 42 a 43 a 44 


- 1 - 


The camera coordinates in Cartesian form are given by 

C hl C h2 

x= p— ; y = p— 
c h4 u h4 

We substitute them into the above equations. After expanding we receive 
xC h4 = a n x + a i2 Y + a 13 Z + a 14 
yC h4 = a 21 X + a 22 Y + a 23 Z + a 24 
c h4 = a 4 1 X + a 42 Y + a 43 Z + a 44 



